Complex folding pathways in a simple /3-hairpin 
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The determination of tlie folding mechanisms of proteins is critical to understand the topological 
change that can propagate Alzheimer and Creutzfeld-Jakobs diseases, among others. The compu- 
tational community has paid considerable attention to this problem; however, the associated time 
scale, typically on the order of milliseconds or more, represents a formidable challenge. Ab initio 
protein folding from long molecular dynamics (MD) simulations or ensemble dynamics is not feasible 
with ordinary computing facilities and new techniques must be introduced. Here we present a de- 
tailed study of the folding of a 16-residue /9-hairpin, described by a generic energy model and using 
the activation-relaxation technique. From a total of 90 trajectories at 300 K, three folding pathways 
emerge. All involve a simultaneous optimization of the complete hydrophobic and hydrogen bond- 
ing interactions. The first two follow closely those observed by previous theoretical studies. The 
third pathway, never observed by previous all-atom folding, unfolding and equilibrium simulations, 
can be described as a reptation move of one strand of the /3-sheet with respect to the other. This 
reptation move indicates that non-native interactions can play a dominant role in the folding of 
secondary structures. These results point to a more complex folding picture than expected for a 
simple /3-hairpin. 

Key words: the Activation-Relaxation Technique; protein folding; simulations; /3-hairpin; rep- 
tation. 
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INTRODUCTION 

As one of the smallest building blocks of proteins, the 
/3-hairpin and particularly the second /3-hairpin of the 
domain Bl of protein G (referred to as /3-hairpin2) has 
been the subject of many theoretical and experimental 
folding studies. This peptide adopts hairpin structures 
in solution but overall its flexibility precludes the deter- 
mination of a high-resolution NMR solution structure. Ill 
Fluorescence experiments show that this /3-hairpin folds 
in isolation with a time constant of 6 microseconds and 
its folding kinetics is described by the two-state model. "2\ 
Because these data do not provide details of the transi- 
tion and such a time scale cannot be covered by hundreds 
of long molecular dynamics (MD)-trajectories at 300 K in 
explicit solvent, alternative methods have been used 
in order to characterize the thermodynamics and folding 
kinetics of the /3-hairpin2. 

Two folding mechanisms have already been proposed. 
The first mechanism, suggested by statistical mechan- 
ical models and lattice Monte Carlo (MC) simula- 
tions, is that folding starts at the turn and propa- 
gates towards the tail by hydrogen bonding interactions, 
the hydrophobic cluster forming at the end of folding. 
One variant of this mechanism suggested by Langevin 
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dynamics of an off-lattice model is that the formation of 
the hydrophobic cluster is followed by zipping of hydro- 
gen bonds (H-bonds), predominantly starting from those 
near the turn. 6] Another variant suggested by all-atom 
MD simulations is that the /3-hairpin folds beginning at 
the turn, followed by hydrophobic collapse and then H- 
bond formation. 

The second mechanism proposed is that the N- and 
C-termini first approach each other to form a loop, and 
the structure propagates from there. This mechanism 
is apparently independent of all-atom force field details, 
since it has been recognized by ensemble dynamics at 
300 K using implicit solvent, 'sj replica exchange method 
combining MD trajectories with a temperature exchange 
MC process using SPG model solvent, M minimalist 
Go-folding discontinuous MD simulations, unfolding 
simulations, 11, 12] and multicanonical MG simulations 
with an implicit solvent model. |l3j 

Along with the difference in folding dynamics within 
each scenario, three questions are yet to be resolved. The 
first question is when the native H-bond network and hy- 
drophobic core form: (i) the hydrophobic core is being 
formed first, and the H-bonds appear, [TlL[T^ fl3. 14, lj| 
(ii) the H-bonds form first and then the hydrophobic 
core, '5| or (iii) the final hydrophobic core and H-bonds 
form simultaneously, j^, la E| The second question is 
whether helical structures exist during folding process. 
Berne et al., by using the OPLSAA force field, did not 
find evidence of significant helical structures in their 
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simulations at all temperatures studied, j^] while Gar- 
cia and Sanbonmatsu found a helical content of 15% at 
low temperatures, 0| Pande et al. detected short-lived 
semi-helical intermediates at room temperature, Q and 
Irback found a low population of a-helix structure at 273 
K. The third question is whether the previously used 
methods, which fail to detect both pathways, may miss 
other inajor folding pathways, as has been discussed re- 
cently |l7| for ensemble dynamics which uses a large num- 
ber of short MD simulations of only tens of nanoseconds 
and a supercluster of thousands of computer processors. 

To address these issues, we simulate the folding mech- 
anisms of hairpin2 using a previously described model — 
the Optimized Potential for Efficient peptide-structure 
Prediction (OPEP), and sampling tec hniq ue — the 
Activation-Relaxation Technique (ART). OPEP 
can be used to simulate any amino acid sequence and 
works for proteins that do and do not form ordered struc- 
tures in solution. Ordered structures include three-helix 
bundles and three-stranded anti-parallel /3-sheet struc- 
tures, among others. |20j For its part, ART generates tra- 
jectories on the configurational energy landscape, identi- 
fying a series of energy minima separated by first-order 
saddle points. The efficiency of ART is not affected by 
the height of the activation energy barriers or the com- 
plexity of the atomic rearrangements and thus samples 
very efficiently the rugged-energy landscapes of small 
proteins. The OPEP-ART approach has been applied 
recently to study the folding of three sequences adopting 
an a-helix, a three stranded antiparallel /3-sheet and a 
/?-hairpin helix in solution. In this work, 82 folding 
simulations at 300 K start from a fully extended con- 
formation {(p— —180°, tp= 180°) using different random- 
number seeds: 52 use the standard OPEP force field (16 
reaching the folded state) , 20 use a modified set of OPEP 
parameters (6 reaching the folded state), and 10 use a 
biased Go- like potential (10 reaching the folded state). 
To determine the effect of the starting structure, we also 
launched 8 independent runs (4 reaching the folded state) 
at 300 K starting from a semi-helical conformation using 
the standard OPEP force field. 

From a total of 90 trajectories at room temperature, 36 
found the native state providing a detailed picture of the 
folding mechanism. Although all these folding trajecto- 
ries involve a simultaneous optimization of the complete 
hydrophobic and hydrogen bonding interactions, the 36 
folding runs can be described by 3 mechanisms: two of 
them follow closely those observed by previous theoretical 
and computational studies, but the third one represents 
a new folding mechanism for proteins. This mechanism 
can be described as a reptation move of one strand of 
the /?-sheet with respect to the other. These three mech- 
anisms offer a complete picture of the /3-hairpin folding, 
independently of the exact amino acid composition, and 
help reconcile conflicting theoretical data on the hairpin2 
of protein G H H US Il3l | or between various hairpins, 
e.g., the first hairpin of tendamistat [23| and a 11-residue 
model peptide. |23 The existence of these three compet- 



ing mechanisms was presented recently in a short commu- 
nication; here, we offer a detailed description of the 
folding mechanisms in this simple (3 hairpin. Further- 
more, we present the results of new simulations using 
either Go-like potential or semi-helical starting confor- 
mations. 



METHODS 

We have simulated the folding of the C-terminal f3- 
hairpin from protein G (residues 41-56). The sequence 
of the peptide is GEWTYDDATKTFTVTE. The energy 
surface was modeled using the OPEP model and the dy- 
namics was obtained by the activation-relaxation tech- 
nique. 

Activation-Relaxation Technique 

ART is a generic method to explore the landscape of 
continuous energy functions through a series of activated 
steps. The algorithm has evolved considerably over the 
years and here we apply its most recent version, ART 
nouveau, |2^ |2^ which uses a recursion method, the 
Lanczos algorithm, 0| to extract the direction of lowest 
curvature of the landscape leading to a first-order sad- 
dle point. Such an approach provides an efficient way to 
extract a limited spectrum of eigenvectors and eigenval- 
ues without requiring the evaluation and diagonalization 
of the full Hessian matrix. A similar approach was also 
introduced in Ref. |2^ An ART event is defined directly 
in the space of configurations, which allows for moves of 
any complexity, and consists of four steps: 

1. Starting from a local minimum, a configuration is 
first pushed outside the harmonic well until a neg- 
ative eigenvalue appears in the Hessian matrix. 

2. The configuration is then pushed along the eigen- 
vector associated with the negative eigenvalue until 
the total force is close to zero, indicating a saddle 
point. The first two steps constitute the activation 
phase. 

3. The configuration is pushed slightly over the saddle 
point and is relaxed to a new local minimum, using 
standard minimization technique. 

4. Finally, the new configuration is accepted/rejected 
using the Metropolis criterion at the desired tem- 
perature. In each of the simulations at hand, this 
four-step procedure was repeated 4000 times, tak- 
ing less than 18 processor-hours on an IBM Power-4 
machine 

As discussed in our previous work, ^2l\ the tempera- 
ture in ART is not a real temperature since ART samples 
the conformational space from one minimum to another 
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minimum. However, ART generates well-controlled tra- 
jectories (more than 83% of events relax back to within 
0.4 A from their initial minima starting at the saddle 
points). |23| A detailed description of the algorithm and 
implementation of ART can be found in earlier publica- 
tions. [11 in EH 

Energy Model 

We use a coarse-grained off-lattice model where each 
amino acid is represented by its N, H, Ca, C, O and one 
bead for its side chain. The exact OPEP energy function, 
which includes solvent effects implicitly, was obtained by 
maximizing the energy of the native fold and an ensemble 
of non- native states for six training peptides with 10- 
38 residues. In this work, the side chain propensities of 
the 20 amino acids for helix, /3-strand and helix 
I29I Isoj l are neglected. The total energy is thus expressed 
by: 

E = wlEl + wrEhbi + whhEhb2 + wscscEscsc 
+wscmEscm + ujm,mEm,m (1) 

The interaction potentiel OPEP is a function of the 
weights w's of the following interactions: 

(i) quadratic terms to maintain stereochemistry: bond 
lengths and bond angles for all particles and improper 
dihedral angles for the side chains and the peptide bonds 
El, 

(ii) and excluded-volume potential of the main chain in- 
teractions Em,m and of side chain-main chain interac- 
tions Esc,M, 

(iii) pairwise contact 6-12 interactions between the side 
chains considering all 20 amino acid types Esc.sc, 

(iv) backbone two-body Ehbi and four-body Ehb2 hy- 
drogen bonding interactions. All nonbonded interactions 
are included (no truncation). 

The two-body energy of one H-bond between residues 
i and j is defined by 

Ehbi = Shb^ fJ-{rij)iy{aij) (2) 

where 

Knj) = H-r-H-r (3) 

1 otherwise 

where r^j is the 0..H distance between the carbonyl oxy- 
gen and amide hydrogen and the NHO angle. 

The cooperative energy between two neighbored H- 
bonds ij and kl is defined by 

Ehb2 = e2/i6exp(-(ry -cr)^/2)exp(-(rfci -cr)V2) 
A{ijkl) (5) 



where A{ijkl) = 1 if (k,l) = (i+1, j+1) or (i-|-2, j-2) 
or (i-l-2, j-l-2), otherwise A{ijkl) ~ 0. This corresponds 
to the pattern of H-bonds in a-helices, anti-parallel and 
parallel beta-sheets, respectively. 

In this work, unless specified we use a = 1.8 A, Shb = 
1.0 kcal/mol if j=i-l-4 (helix), otherwise = 1.5 kcal/mol, 
and e^hb = —2.0. The other parameters can be found in 
Ref.lM 



Trajectory Analysis 

Our native structure contains six main chain H-bonds 
excluding the one at the turn since it rarely forms because 
of geometrical constraints. Following Karplus and 
Garcia, (14] they are numbered from the tail to the turn 
42:HN-55:0 (HI), 55:HN-42:0 (H2), 44:HN-53:0 (H3), 
53:HN-44:0 (H4), 46:HN-51:0 (H5), 51:HN-46:0 (H6). 
The expression i:HN-j:0 denotes the atoms involving a 
H-bond between residues i and j. 

To characterize a conformation, we use the number of 
native H-bonds, the radius of gyration of the hydropho- 
bic core {Rgcore), and the Cc-rmsd of residues 41-56 from 
the 2GB1 structure. [U Rgcore is calculated using the 
side chains of the four residues W43, Y45, F52, and V54. 
A H-bond is defined if it satisfies DSSP conditions: [s^ 
the distance between the carbonyl oxygen and amide hy- 
drogen (0..H) is less than 2.4 A and the NHO angle is 
greater than 145°. 

The regions of conformational space that have been 
sampled by all simulations are clustered as follows. The 
Ctj-rmsd is calculated for each pair of structures in each 
simulation. The number of neighbors is then computed 
for each structure using a C^-rmsd cutoff of 1.5 A. The 
conformation with the highest number of neighbors is 
considered as the center of the first cluster. All the neigh- 
bors of this conformation are removed from the ensemble 
of conformations. The center of second cluster is then de- 
termined in the same way as for the first cluster, and this 
procedure is repeated until each structure is assigned to a 
cluster. Then we have a list of central structures of clus- 
ters for every folding simulation. Once all the trajectories 
are clustered, we can cluster all the central structures in 
two different folding trajectories, in order to identify the 
common clusters between the simulations. 



RESULTS 

Native vs. Non-native Hairpin Structures 

Cluster analysis of all ART-generated trajectories 
shows that the lowest-energy conformation {E = — 33 
kcal/mol) deviates by less than 1 A C'a rms from 
the hairpin structure within protein G (PDB code 
2GB1 _31j). For the purpose of our simulation, we de- 
fine a structure as being fully folded - or native - if it 
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satisfies the following four criteria: (i) the six native H- 
bonds are formed (see Methods); (ii) the backbone dihe- 
dral angles (0, ip) have standard /3-sheet values (around 
(—90°, 150°)); (iii) the hydrophobic core is well packed 
(core radius of gyration is around 4.3 A) and (iv) the 
all-residue Ca rmsd from 2GB1 is less than 2.5 A. These 
conditions for nativeness are thus more stringent than 
those of in previous folding simulations. 0| For Za- 
grovic and collaborators, (Sj the H-bonds connecting the 
two strands are not required to be native and hairpin 
with asymmetric strands are considered as native struc- 
tures. Similarly, for Zhou et al, [lol | the condition for 
nativeness is that all-heavy atom rmsd from the global 
minimum structure is less than 2.5 A. The rmsd between 
their modeled structure and the experimental 2GB 1 is 
not given. 

Figure ^ (produced by using the MOLMOL software 
[s^l) shows the 2GB1 structure (a), our native structure 
(b) and six non-native hairpin structures (c-h) sampled 
by ART. Figure^e) and H^h) show two hairpins in which 
the turn (residues 7-11) is shifted toward C terminus. 
Figure^f) and^g) show two non-native hairpins where 
the turn (residues 6-10) is shifted toward the N terminus. 
In this study, only hairpin (b), of lowest energy, is the 
native state, while, according to the definition of folded 
state in Ref. hairpin structures (c)-(h) are also folded 
states. Our hairpin structures (c) and (d) are very similar 
to the folded state in Series 17 (two key H-bonds are 
HB53-44 and HB52-45) and Series 2 (two key H-bonds 
are HB45-52 and HB43-54) of Ref. H respectively. Both 
of them have symmetric strands, but different H-bond 
network pattern. Moreover, from the key H-bonds in 
their eight independent folding trajectories, it seems that 
the hairpin structures in Series 1, 7, 9, 11 are asymmetric 
about the /3-turn. 



Analysis of Unfolded Trajectories 

From a total of 90 runs, 36 trajectories reach the na- 
tive state. The other 54 fail to locate the native structure 
within 4000 ART-events. These runs lead either to non- 
native hairpin conformations as discussed previously or 
to other metastable conformations of various secondary 
compositions, e.g. a-helix with coil , three-stranded an- 
tiparallel /3-sheet and short a-helix with /3-like structures. 

Figure El shows the structural features of the 8 low- 
est energy metastable structures sampled. The 54 
metastable states lie between —21 and —30 vs. —33 
kcal/mol for the native state. It is important to note that 
these structures do not represent dead ends for the sim- 
ulation; continuing the simulation at 300 K for two arbi- 
trarily chosen metastable states (c) and (d) , we find that 
it is possible to reach the fully formed hairpin structure 
within 5000 additional trial events. In Fig.|31 we plot the 
energy vs. rmsd for the lowest-energy structures in all the 
52 simulations (16 folded and 36 unfolded) using the stan- 
dard OPEP and starting from the fully extended state. 



We see that all folded structures appear in a dense region 
around —33 kcal/mol and below an rmsd of 2.0 A and 
are well separated from the non-folded ones; clearly, our 
potential can discriminates folded states from metastable 
states. By visual inspection of the 36 metastable states 
and comparison with Fig.O we see (i) that the conforma- 
tions with E between —25 and —30 kcal/mol and rmsd 
between 2.0 and 4.1 A adopt asymmetric /3-sheet struc- 
tures with different H-bond networks, (ii) that the con- 
formations with E between —25 and —28 kcal/mol and 
rmsd between 6.0 and 8.5 A show three-stranded antipar- 
allel /3-sheet structures, (iii) and that the other conforma- 
tions with E between —21 and —25 kcal/mol and rmsd 
between 3.0 and 8.0 A adopt a-helix with coil, short /3- 
hairpin with coil, or short a-helix with /3-like structures. 
As the rmsd of some non-native hairpin structures can be 
as small as 2.1 A (see Fig.|5Jb)), it is clear that this sole 
criterion is not sufhcient to differentiate between native 
and non- native hairpin structures. 



Analysis of Folded Trajectories 

As reported in Ref.i24 the 16 folding simulations using 
the standard OPEP potential which start from a fully 
extended state can be classified into 3 mechanisms : I (4 
runs), II (7 runs) and III (5 runs). 

A detailed analysis of mechanism I, seen in four fold- 
ing trajectories, is given in Figure This mechanism 
is similar to that described in Refs. 0, 0, |23 Starting 
from a fully extended state, the peptide first collapses 
into a compact state with the turn placed in the right 
section of the chain (residues 7-10) (Events 53); this step 
is characterized by the formation of a partially packed 
hydrophobic core (radius of gyration of the hydrophobic 
core, Rgcore, is 4.8 A, see Fig.^Jb)) and the appearance 
of several non-native H-bonds (Fig.0Jc)). The following 
steps serve to stabilize its hydrophobic core. At event 
80, a native H-bond near the turn (H5) forms, followed 
rapidly by the formation of H4; 29 events later (event 
109), the peptide reorganizes its hydrophobic core to a 
well packed state {Rgcore reaches its final value 4.3 A). 
The reorganization of the hydrophobic core allows the 
formation of new native main chain H-bonds (H3, H2, 
H6). At that point, the two-end distance fluctuates 
around 9 A (Fig.^b)). In spite of these new native H- 
bonds, the flexibility of the loop remains large and H6, 
an H-bond near the loop, breaks and reforms many times 
between events 128 and 300 (see Run 1 in Fig. (S)). Fi- 
nally, the H-bond HI, near the end of the peptide, forms 
at last (event 471) leading to the native state: the core 
radius of gyration remains its final value 4.3 A, the rmsd 
from 2GB1 structure drops to 2.0 A (Fig. I^b)), the to- 
tal number of native H-bonds is six, and the total energy 
is —32 kcal/mol (Fig. 0fc)). The time formation of the 
six native H-bonds H1-H6 can be seen in Fig. O for the 
four folding runs. The number of native and non-native 
H-bonds in each accepted conformation as a function of 
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FIG. 1: The sampled hairpin structures with different main chain H-bonds and 2GB1 structure, (a) 2GB1 structure; (b) 
Hairpin structure with 6 native H-bonds (H1-H6), rmsd =1.09 (0.88) A, E = -33 kcal/mol; (c) hairpin structure with 5 key 
H-bonds (H1-H4, 52:HN-45:0), rmsd = 2.87 (2.15) A, E = -29 kcal/mol; (d) hairpin structure with 4 key hydrogen bonds 
(43:HN-54:0, 45:HN-52:0, 52:HN-45:0, 54:HN-43:0), rmsd = 2.62 (2.39) A, E = -28 kcal/mol; (e) hairpin structure with 4 key 
H-bonds (44:HN-54:0, 46:HN-52:0, 52:HN-46:0, 54:HN-44:0), rmsd = 2.08 (1.87) A, E = -30 kcal/mol; (f) hairpin structure 
with 5 key H-bonds (43:HN-53:0, 45:HN-51:0, 51:HN-45:0, 53:HN-43:0, 45:HN-41:0), rmsd = 2.42 (2.18) A, E = -29 kcal/mol; 
(g) hairpin structure with 4 key H-bonds (42:HN-54:0, 44:HN-52:0, 52:HN-44:0, 54:HN-42:0), rmsd = 2.40 (1.98) A, E = 
-28 kcal/mol; (h) hairpin structure with 4 key H-bonds (43:HN-55:0, 45:HN-53:0, 53:HN-45:0 , 55:HN-43:0), rmsd = 3.56 
(2.96) A, E = -28 kcal/mol. The rmsd value in parentheses is the Ca-rmsd from 2GB1 for residues 43-54. According to our 
definition of folded state, only hairpin (b) is folded state. 




(a) (b) (e) (d) 




(e) (f) is) (h) 

FIG. 2: The sampled metastable states using the standard OPEP potential, starting from a fully extended state, (a) hairpin 
structure with 4 key H-bonds (43:HN-54:0, 45:HN-52:0,52:HN-45:0, 54:HN-43:0), rmsd = 2.62 A, E = -28 kcal/mol; (b) 
hairpin structure with 4 key H-bonds (44:HN-54:0 , 46:HN-52:0, 52:HN-46:0, 54:HN-44:0), rmsd = 2.08 A, E = -30 kcal/mol; 
(c) hairpin structure with 4 key H-bonds (44:HN-54:0 , 46:HN-52:0, 52:HN-46:0, 54:HN-44:0), rmsd = 2.88 A, E = -27 
kcal/mol; (d) hairpin structure with 5 key H-bonds (43:HN-53:0, 45:HN-51:0, 51:HN-45:0 , 53:HN-43:0, 45:HN-41:0), rmsd 
= 2.6 A, E = -29 kcal/mol; (e) hairpin structure with 4 key H-bonds (44:HN-55:0 , 46:HN-53:0, 53:HN-46:0, 55:HN-44:0), 
rmsd — 3.34 A, E = -23 kcal/mol; (f) a a-f3 structure with short helix appearing in the C terminal, rmsd = 7.39 A, E = -22 
kcal/mol; (g) an Q-helix structure involving residues from 5 to 15, rmsd = 5.89 A, E = -23 kcal/mol; (h) a three-stranded 
beta-sheet structure with 4 H-bonds (42:HN-49:0, 49:HN-42:0, 48:HN-54:0 , 54:HN-48:0), rmsd = 7.81 A, E = -27 kcal/mol. 
Two metastable state (c) and (d) converge to the native state within 5000 additional trial events. 



event number is given in Fig. ^c). We clearly see that 
the folding process is a competition between native and 
non-native hydrogen bonding interactions, and that in 
Mechanism I the two ends of this /3-hairpin gradually 
come near to form HI last, and the hydrophobic core 
becomes well packed rapidly. 

Mechanism II, seen in seven folding tra ject ories, was 
observed in simulations of hairpin2 Ifll. and of a 
10- residue model peptide, js^ Figure shows the major 



folding steps associated with this mechanism. Within a 
few events, the hydrophobic interaction between the four 
residues W43, Y45, F52, and V54 induces the formation 
of a partial hydrophobic core {Rgcore is 6.4 A) result- 
ing in a globular state with three non-native H-bonds 
(event 55). At the same time, the N- and the C-termini 
approach each other to form a large loop. Because of 
the competition between the hydrophobic, native, and 
non-native hydrogen bonding forces, the peptide rear- 
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FIG. 3: Rmsd as a function of energy of the lowest-energy 
structures generated in the 52 simulations (16 folded and 36 
unfolded) using the standard OPEP potential, starting from 
a fully extended state. Circles, triangles, and crosses are for 
native /3-hairpin, asymmetric /3-hairpin with different H-bond 
pattern, and other different structures, respectively. 



ranges its hydrophobic core by breaking and reforming 
some non-native interactions {Rgcore increases to 8 A). 
At event 191, the hydrophobic core drops to 5.5 A and, 
after 16 more events, the H-bond HI (event 207) near 
the end forms. This reorganization of hydrophobic core 
causes the second (H2 at event 366) and third (H3 at 
event 417) H-bonds to form. After a 150-event optimiza- 
tion between hydrophobic and hydrogen bonding inter- 
actions, the radius of gyration of the hydrophobic core 
reaches its final value (4.3 A at event 567), and the fourth 
H-bond H4 appears. In the following several events, H5 
(at event 570) and H6 (at event 573) form; the peptide 
satisfies the native conditions (see Fig. 0). Fig. |7| shows 
the time formation of the six native H-bonds H1-H6 in 
the seven folding trajectories. As is shown in Fig.|H|and 
Fig. the two ends of this /3-hairpin come near to form 
H-bond HI first (and to a lesser extent H2 first), and 
the hydrophobic core reorganizes to its well packed state 
slowly. The partial helical structure (Events 169 and 280) 
is not a necessary intermediate and appears in 4 of the 7 
trajectories following mechanism H. Comparing the two 
mechanisms, one can see that they are not mutually in- 
compatible; many trajectories fall somewhere in between 
these two descriptions. In some cases, for example, fold- 
ing is initiated in the middle region (H4). The /3-sheet 
then propagates first outwards (forming H1-H3) and then 
inwards (forming H5-H6) (see Run 6 in Fig. [71). 

Mechanism HI had not been observed in previous all- 
atom folding, IM, hO| unfolding, |llj| and equilibrium sim- 
ulations. 0, Il3l Il4l | This mechanism, seen in five fold- 
ing trajectories, is characterized by a rapid folding into 
a collapsed state with a turn at the wrong place, form- 



ing an asymmetric hairpin structure stabilized by non- 
native H-bonds and a partially packed hydrophobic core. 
Then slowly, step by step in a reptation mode, the asym- 
metry is corrected, with non-native H-bonds breaking 
and reforming, in a structure getting closer to the na- 
tive hairpin. A representative trajectory is given in 
Fig. |S1 Folding begins with the formation of a compact 
state defined by a partially packed hydrophobic core and 
two non-native H-bonds 46:HN-55:0 and 48:HN-53:0 
(event 55). Then the number of non-native H-bonds in- 
creases to four: 46:HN-55:0, 48:HN-53:0, 53:HN-48:0, 
55:HN-46:0, and a short /3-sheet structure (Event 84) 
appears. Driven by the hydrophobic interactions, at 
event 99, the reptation motion of the loop causes the 
four non-native H-bonds to break, and four new non- 
native H-bonds 44:HN-55:0, 46:HN-53:0, 53:HN-46:0, 
55:HN-44:0 form. This peptide shows a new asymmet- 
ric /3-sheet structure, which is closer to its symmetric 
/3-hairpin structure. During the next 200 events, this 
peptide stays in this state and reorganizes its hydropho- 
bic core. After an optimization of the hydrophobic and 
hydrogen bonding interactions, at event 302, the repta- 
tion motion of the loop enhances longitudinal motion of 
the two strands, breaking the four non- native H-bonds 
and forming four native ones (H1-H4); simultaneously, 
the core radius of gyration drops to 4.7 A. The hairpin 
structures forms rapidly afterwards, with the addition of 
the fifth and sixth native H-bond (H5 and H6), a rmsd 
dropping to 1.4 A (Event 334), and the energy approach- 
ing to -33 kcal/mol. Once the native hairpin forms, it 
is fairly stable: the total number of H-bonds remains at 
six, the radius of gyration of the hydrophobic core keeps 
around 4.3 A, the rmsd fiuctuates around 1.4 A, and the 
energy fluctuates around —33 kcal/mol. As shown in 
Figure |H1 and Fig. O the critical and rate-limiting step 
in this mechanism is the breaking, almost in synchrony, 
of four or five non-native H-bonds of a structure very 
close to the native state, followed by a rapid formation 
of the native H-bonds. It can also be seen from Fig. |S1 
that the two ends of this /3-hairpin slowly come near, and 
the hydrophobic core reorganizes to its well packed state 
slowly. 



DISCUSSION 

Sensitivity of Trajectories 

Because various interaction potentials have led previ- 
ously to confiicting reports regarding the details of fold- 
ing, it is essential to determine how changes in the force- 
field and the starting structure affect the folding trajec- 
tories. 

We consider here two variations of the force field. 
Firstly, we run 20 simulations using OPEP with an en- 
ergy parameter for the H-bond Shb increased from the 
standard value of 1.5 to 2.5, a change which could consid- 
erably affect the stability of intermediate structures. Re- 
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FIG. 4: Detailed analysis of a representative folding trajectory following Mechanism I, simulated at 300 K, starting from a 
fully extended state, (a) Six snapshots. Only accepted events are shown, (b) Ca-rmsd from 2GB1 structure of the hairpin, 
radius of gyration of the hydrophobic core Rgcore, and the two-end distance as a function of accepted event number, (c) Total 
energy, number of all the native and non-native H-bonds in each sampled conformation as a function of accepted event number. 



markably, we still recover the three folding mechanisms 
among the six successful trajectories. Secondly, we run 
10 simulations using ART with Go-like OPEP potential 
{ E = E{OPF.P) + k*iTias(f), for k = 0.5, 0.55 and 0.6) 
which favors native contacts in the total energy. Starting 
from a fully extended state, all ten simulations found the 
folded state following either mechanism I or II: six sim- 
ulations fold from the end region (HI or H2 form first), 
two from the middle (H3 forms first) and two from the 
turn (H5 first). As in earlier Go- model simulations, ^3 



we find no helical intermediates in any of the ten folding 
simulations. In these simulations, folding can be initi- 
ated at the end region, middle region, and turn region 
of the peptide, in agreement with the results obtained 
by using only the OPEP potential. Because native in- 
teractions are strongly favored in Go-model simulations, 
the asymmetric conformations found in mechanism III 
become prohibitive, preventing the appearance of mech- 
anism III. The bias of the Go potential can therefore lead 
to an incomplete sampling of the folding paths. 
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FIG. 5: Status of the six native H-bonds as a function of 
accepted event number in the four runs foUowing Mechanism 
I. Green: not formed, red: formed. Run 1 is the simulation 
described in Fig. 4. 



To determine the impact of the starting conformation 
on the ART-trajectories and to address the question of 
semi-hehcal intermediates during folding, [H eight trial 
simulations were attempted at 300 K starting from a 
semi-helical structure. Four of those resulted in hairpin 
structures with six native H-bonds, while the remain- 
ing four displayed helical structures or /3-hairpin struc- 
tures with non-native H-bonds. Interestingly, the four 
folded trajectories closely follow Mechanism II. FigurelTUI 
gives a representation of folding simulation at 300 K. The 
semi-helical structure rapidly relaxes to a more compact 
structure (event 92) in which the two ends of the peptide 
approach each other. At this stage, a partially packed hy- 
drophobic core {Rgcore is 6.7 A) appears, without native 
H-bonds, however. Driven by the strong hydrophobic in- 
teractions among the four hydrophobic residues (Tyr45, 
Phe52, Trp43, Val54), the two ends of the peptide come 
nearer and one native H-bond (HI) is formed at the end 
region of the peptide. After 50 events, the next two H- 
bonds H2 and H3 form almost simultaneously. The pep- 
tide stays in this state for 350 events, reorganizing slowly 
the hydrophobic core {Rgcore oscillates around 5.2 A and 
the rmsd from the 2GB 1 structure reaches a plateau (4 
A)). After a slow structural adjustment process, the hy- 
drophobic core forms fully in a large cooperative move 
at event 500 (Rgcore drops to 4.3 A), and the six na- 
tive H-bonds set in rapidly afterwards (H4 and H5 form 
first, followed by the formation of H6). This event marks 
the completion of the folding process of the peptide with 
the rmsd reaching to 1.5 A. This demonstrates that ART 



can find the folded state starting from a helical struc- 
ture and that this helical structure may exist in the real 
folding pathway of this protein, as discussed in previous 
simulations 8, 14, 16J and recent energy landscape char- 
acterization of /3-hairpin2 and its isomers 35] . 

From the structure of the initial helical state, we can 
explain why the four folded trajectories only follow Mech- 
anism II. This state is characterized by a helical structure 
spanning residues 47-50, and two non-native H-bonds 
(50:HN-46:O and 51:HN-47:0). The existence of this he- 
lical segment makes it difficult to form native H-bonds 
at the turn or the middle region of the peptide because 
of geometric restrictions. However the two ends of this 
peptide are very flexible, driven by the hydrophobic inter- 
actions, they can approach each other easily, then form 
a H-bond first between them. When the initial state is a 
fully extended state, this peptide has much more freedom 
to find its native state, and this is why multiple folding 
pathways are present. 



Cluster Analysis of the Folded Trajectories 

As we have seen in the previous section, the folding 
process can be described by a single concept: the com- 
petition between three types of interactions. A cluster 
analysis shows, however, that this does not mean that 
the folding trajectories can be unified. 

We perform a cluster analysis following the procedure 
described in Ref. |3^ (see Methods) , using a Ca-rmsd cut- 
off of 1.5 A. All the accepted conformations in each folded 
trajectory of the 26 folding simulations obtained by the 
standard OPEP potential are used for this procedure. A 
total of 12-40 clusters were found for each folded tra- 
jectory, indicating strong variations in the details of the 
folding trajectories. Moreover, we find very little over- 
lap of the basins between similar trajectories: except for 
the trivial initial and native clusters, it is generally not 
possible to match more than one or two clusters between 
trajectories following the same mechanisms. We obtained 
the same qualitative results using other clustering anal- 
ysis. 

The failure of cluster analysis by Ca-rmsd to char- 
acterize the three different folding mechanisms for this 
/3-hairpin is due to the flexibility of this small peptide. 
For example, the asymmetric /3-hairpin structures in the 
folding trajectory following the reptation mechanism are 
very diverse, presenting a wide range of hydrogen-bond 
patterns. Moreover, the Ca-rmsd between a beta hairpin 
with the turn shifted to C-terminal and a beta hairpin 
with the turn shifted to N-terminal is as big as 4.3 A. 
Increasing the Ca-rmsd cutoff will make it difficult to 
differentiate asymmetric /3-hairpin state from the folded 
state. The classification in terms of folding mechanisms 
as identified by the formation of hydrogen bonds appears 
therefore superior to the clusterization method for this 
small peptide. 
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FIG. 6: Detailed analysis of a representative folding trajectory following Mechanism II, simulated at 300 K, starting from a 
fully extended state, (a) Six snapshots, (b) Ca-rmsd from 2GB1 structure of the hairpin, radius of gyration of the hydrophobic 
core Rgcore, and the two-end distance as a function of accepted event number, (c) Total energy, number of all the native and 
non-native H-bonds in each sampled conformation as a function of accepted event number. 



Does the Reptation Folding Trajectory Exist? 

Surprizingly, although present in part in many previ- 
ous simulations, the reptation mechanism had not been 
identified previously. The asymmetric conformations 
with only non-native H-bonds, which characterize the in- 
termediate states in Mechanisms 111, are found in many 
simulations. For example, a cluster analysis of the struc- 
tures produced in an all-atom multicanonical MC sim- 
ulation of the /3-hairpin2 finds that these asymmetric 
conformations account for 20% of all conformations. Iia| 



Similarly, Skolnick and collaborators find that a lattice 
model often folds into asymmetric structures, i In a re- 
cent work of Irback, a local minimum corresponding to 
a /3-hairpin with non-native topolog y is observed in the 
energy landscape of this /3-hairpin2. [l^ Finally, most of 
the folded conformations identified by Zagrovic et al. in 
distributed MD simulations (Folding@home) seem asym- 
metric although this is not explicitly stated. [H Simi- 
lar results are found in smaller peptides such as the 11- 
residue model peptide of Wang et al. studied by molecu- 
lar dynamics, which also displays asymmetric /3-hairpin 
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FIG. 7: Status of the six native H-bonds as a function of 
accepted event number in the seven runs foUowing Mechanism 
II. Green: not formed, red; formed. Run 11 is the simulation 
described in Fig. 6. 

structures. [2^ 

Although many simulations found the asymmetric con- 
formation, none seems to have been able to overcome the 
rate-limiting step, which requires breaking all non-native 
bonds at once (Fig. |H1 (b)), in order to form the native 
state. With OPEP, this barrier is found to be about 12 
kcal/mol and corresponds to a time scale on the order 
of /zs, (37] near the experimental folding time but much 
beyond what can be reached by standard molecular dy- 



namics. 

In addition to display a time scale in agreement with 
experiment, there is experimental evidence that many 
/?-hairpin sequences can populate two distinct hairpin 
conformations of various loop lengths and pairings of /3- 
strands in solution. '33,l39( Fluorescence microscopy also 
suggests that myosin can induce the reptation of actin fil- 
aments when adenosine triphosphate is added. |0| The 
reptation mechanism, which has been well established as 
a fundamental movement in polymer chain, might there- 
fore exist in the folding of a simple /3-hairpin. 

Competition between Hydrophobic and Native 
Hydrogen Bonding Interactions 

The folding mechanisms described above are the result 
of a strong competition between hydrophobic and native 
hydrogen-bonding interactions. All cases show that the 
hydrophobic interactions play a dominant role in the fold- 
ing process. At the beginning of the folding, a partially 
packed hydrophobic core always forms before native H- 
bonds appear (see Fig. ^b) and (c), Fig. El^b) and (c), 
Fig.EIb) and (c), Fig.HUb) and (c)). The following step 
is the rearrangement of the hydrophobic core and the 
optimization between the complete hydrophobic and na- 
tive hydrogen bonding interactions. When hydrophobic 
and hydrogen bonding interactions reach a balance, the 
well packed hydrophobic core forms before (Fig. EJb) and 
(c)) or at the same time as (Fig. Efb) and (c), Fig.lHfb) 
and (c). Fig. llOf b) and (c)) the native H-bonds network 
forms. 



Competition between Native and Non-native 
Hydrogen Bonding Interactions 

Native and non-native hydrogen-bonding interactions 
also compete strongly during the folding process. The 
initial collapse is always accompanied by the formation of 
three to six non-native H-bonds (see Fig. ^c), Fig.EJ^c), 
Fig. 13b), Fig. Cnic)). However, these non-native H- 
bonds are not stable in the long run; they form, break, 
and reform in response to the movement of the hydropho- 
bic core. Driven by the hydrophobic interactions, the 
non-native hydrogen bonding interactions finally become 
weaker, and native H-bonds form, leading rapidly to the 
native state. The whole folding process can therefore be 
described as a balance between hydrophobic, native and 
non-native hydrogen bonding forces. 

CONCLUSIONS 

By demonstrating that the folding of a /^-hairpin can 
be initiated at the end, the middle and the turn region, as 
well as from an asymmetric conformation, the three fold- 
ing mechanisms proposed here help reconcile conflicting 
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FIG. 8: Detailed analysis of a folding trajectory following Mechanism III, simulated at 300 K, starting from a fully extended 
state. Another folding trajectory was presented in a short communication. [23 | (a) Six snapshots, (b) Ca-rmsd from 2GB1 
structure of the hairpin, radius of gyration of the hydrophobic core Rgcore, and the two-end distance as a function of accepted 
event number, (c) Total energy, number of all the native and non-native H-bonds in each sampled conformation as a function 
of accepted event number. 



theoretical data on the hairpin2 of protein G P.I^. IT(1IT^ 

or between various hairpins, e.g. the first hairpin of ten- 
damistat. j23 | Using these three mechanisms, we can now 
propose a complete picture of the folding of /3-hairpins 
which does not depend on the exact amino-acid com- 
position. The exact folding path followed by a given 
/3-hairpin should be influenced by its sequence and the 
solvent conditions; all paths should, however, belong to 
one of the three mechanisms presented here. The first 
two mechanisms, with the propagation from either the 
turn or the end points, had already been described in 



previous reports on /3-hairpin 2, but no previous method 
had managed to detect both pathways. The third mech- 
anism, identified in these simulations for the first time, 
involves folding into an asymmetric state followed by a 
reptation of one strand over the other until the peptide 
reaches its native state. The existence of this mechanism 
is suppported by a number of experimental and numer- 
ical results even though the rate-limiting step and the 
presence of non-native states place it outside the scope 
of most simulation methods, including unfolding. Go and 
standard MD approaches. This last results underlines the 
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importance of direct non-biases methods, such as ART, 
for studying the folding process. 



Although complex, presenting a large number of dif- 
ferent paths, the folding of this /3-hairpin can still be 
described by a unique process of competition between 
hydrophobic core and native and non-native H-bond in- 
teractions. In particular, it is clear that non-native H- 
bond interactions can play a critical role in the folding 
process even though they are absent in the final product. 
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FIG. 9: Analysis of all trajectories following Mechanism III. 
(a) Rmsd from the 2GB1 structure, (b) number of non-native 
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than 1.8 A (see (a)) in the five folding simulations. In the 
five folding simulations, on average, the non-native H-bonds 
break almost at the same time (see(b)), then the native H- 
bonds form rapidly, almost instantaneously (see(c)). 
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